Learning Generative Models with Visual Attention
نویسندگان
چکیده
Attention has long been proposed by psychologists to be important for efficiently dealing with the massive amounts of sensory stimulus in the neocortex. Inspired by the attention models in visual neuroscience and the need for object-centered data for generative models, we propose a deep-learning based generative framework using attention. The attentional mechanism propagates signals from the region of interest in a scene to an aligned canonical representation for generative modeling. By ignoring scene background clutter, the generative model can concentrate its resources on the object of interest. A convolutional neural net is employed to provide good initializations during posterior inference which uses Hamiltonian Monte Carlo. Upon learning images of faces, our model can robustly attend to the face region of novel test subjects. More importantly, our model can learn generative models of new faces from a novel dataset of large images where the face locations are not known.
منابع مشابه
A Hierarchical Generative Model of Recurrent Object-Based Attention in the Visual Cortex
In line with recent work exploring Deep Boltzmann Machines (DBMs) as models of cortical processing, we demonstrate the potential of DBMs as models of object-based attention, combining generative principles with attentional ones. We show: (1) How inference in DBMs can be related qualitatively to theories of attentional recurrent processing in the visual cortex; (2) that deepness and topographic ...
متن کاملUnsupervised learning by completing partially occluded CAD renderings
Unsupervised learning has received a lot of attention in the recent past and has been applied to multiple domains images, videos, ego-centric motions, etc. In this project, we propose to use a large corpus of 3D CAD models to train an unsupervised representation. We follow the Deep Convolutional Generative Adversarial Network (DCGAN) framework to train a generative and a discriminative model. G...
متن کاملBest of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce ‘safe’ and ...
متن کاملVisual-textual Attention Driven Fine-grained Representation Learning
Fine-grained image classification is to recognize hundreds of subcategories belonging to the same basic-level category, which is a highly challenging task due to the quite subtle visual distinctions among similar subcategories. Most existing methods generally learn part detectors to discover discriminative regions for better classification accuracy. However, not all localized parts are benefici...
متن کاملOne-Shot Generalization in Deep Generative Models
Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing ...
متن کامل